Goto

Collaborating Authors

 individual reviewer


then respond to questions raised by individual reviewers. 2 1. Response to common concerns

Neural Information Processing Systems

We thank all reviewers for their positive comments. We will follow up on this. Since this work focuses on theory, we leave numerical studies for our algorithms to future work. We appreciate the minor issues pointed out by the reviewers, and we will fix them in our final paper.


more specific questions raised by each individual reviewer. 2 Common

Neural Information Processing Systems

We thank the reviewers for their comments. All reviewers agree that our theoretical results are solid and well-explained. "performing a clustering on the initial models should already give the right clusters" . Experiments We observe that the following common questions about experiments were raised by the reviewers. "W as a separate dataset for parameter evaluation used?": We choose the hyperparameters within a wide range.



then respond to questions raised by individual reviewers. 2 1. Response to common concerns

Neural Information Processing Systems

We thank all reviewers for their positive comments. ": We remark that this type of assumptions is common and standard in We will explore this direction in future work. We will add this discussion in our final paper. We will follow up on this. We will explore this extension in future work.




are our responses to individual reviewers: 2 Reviewer # 1

Neural Information Processing Systems

We thank the reviewers for providing those helpful comments, especially during the challenging time this year. All the empirical results are for small examples (CIF AR-10) raising the question of scalability. We cite previous work on SA T solver support for cardinality constraints (Liffiton et al. [35]) and are happy There are more compact and efficient encodings for encoding cardinality constraints compared to sequential counters. We use sequential counters only for comparison with other BNN verification research (e.g. The main contribution of the paper is the extension of an existing SAT solver (i.e., MiniSAT) to a SAT solver that can Icarte et al. 2019 focuses on generalizability in few-shot learning, which is a different setting than ours.



Avoiding a Tragedy of the Commons in the Peer Review Process

Sculley, D, Snoek, Jasper, Wiltschko, Alex

arXiv.org Machine Learning

Peer review is the foundation of scientific publication, and the task of reviewing has long been seen as a cornerstone of professional service. However, the massive growth in the field of machine learning has put this community benefit under stress, threatening both the sustainability of an effective review process and the overall progress of the field. In this position paper, we argue that a tragedy of the commons outcome may be avoided by emphasizing the professional aspects of this service. In particular, we propose a rubric to hold reviewers to an objective standard for review quality. In turn, we also propose that reviewers be given appropriate incentive. As one possible such incentive, we explore the idea of financial compensation on a per-review basis. We suggest reasonable funding models and thoughts on long term effects.


How to Calibrate the Scores of Biased Reviewers by Quadratic Programming

Roos, Magnus (Heinrich-Heine-Universität) | Rothe, Jörg (Heinrich-Heine-Universität) | Scheuermann, Björn (Julius-Maximilians-Universität Würzburg)

AAAI Conferences

Peer reviewing is the key ingredient of evaluating the quality of scientific work. Based on the review scores assigned by the individual reviewers to the submissions, program committees of conferences and journal editors decide which papers to accept for publication and which to reject. However, some reviewers may be more rigorous than others, they may be biased one way or the other, and they often have highly subjective preferences over the papers they review. Moreover, each reviewer usually has only a very local view, as he or she evaluates only a small fraction of the submissions. Despite all these shortcomings, the review scores obtained need to be aggregrated in order to globally rank all submissions and to make the acceptance/rejection decision. A common method is to simply take the average of each submission's review scores, possibly weighted by the reviewers' confidence levels. Unfortunately, the global ranking thus produced often suffers a certain unfairness, as the reviewers' biases and limitations are not taken into account. We propose a method for calibrating the scores of reviewers that are potentially biased and blindfolded by having only partial information. Our method uses a maximum likelihood estimator, which estimates both the bias of each individual reviewer and the unknown "ideal" score of each submission. This yields a quadratic program whose solution transforms the individual review scores into calibrated, globally comparable scores. We argue why our method results in a fairer and more reasonable global ranking than simply taking the average of scores. To show its usefulness, we test our method empirically using real-world data.